STAT3926/STAT4026: Statistical Consulting

Survival Analysis

Authors

Prepared by: 510288769, 500199066

Prepared for: David Foxe

Published

May 17, 2024

Executive Summary

Background

Previously, the client implemented a Kaplan-Meier survival curve to model the survival predictions across different types of dementia.

Client’s Aims

The client has two aims for this project. The first is to produce an interactive graph that provides a comprehensive exploration of the time from formal diagnosis to death based on four key covariates: age at diagnosis, sex, disease duration at diagnosis, and overall cognitive performance measured by the ACE-III test. These covariates were selected by the client based on their expertise and the availability of data. The second aim is to explore additional variables, including those from the Cambridge Behavioural Inventory-Revised carer questionnaire, family history of dementia as defined by a Goldman Score, and cognitive abilities. The client suspects these variables may influence survival rates and wants more clarity on the extent of the impact each factor has on survival.

Model

To meet the client’s objective of understanding how various covariates affect different dementia patient groups, we implemented a Cox Proportional Hazards regression model for each type of dementia, serving as the foundation of the interactive survival graph. This approach was informed by previous client analyses that highlighted statistically significant survival differences across dementia diagnoses. Each model accounted for age at diagnosis, sex, disease duration at diagnosis, and overall cognitive performance as assessed by the ACE-III test. While the Cox regression model does not assume any underlying data distribution, it does require other assumptions to be satisfied (details in the appendix).

Interactive Survival Graph

The interactive survival graph is designed to facilitate the analysis of different patient groups by inputting their specific data. This feature allows users to observe and compare how various factors impact survival rates across different groups. As shown in the graph, users can select patient attributes such as age at diagnosis, disease duration, ACE-III total scores, and gender, along with specific dementia diagnoses. By adjusting these parameters, the graph dynamically illustrates the survival probabilities over time, enabling a detailed comparison between two distinct patient groups.

Patient Group 1
Patient Group 2

Significance of variables

Warning: NAs introduced by coercion
Warning: NAs introduced by coercion

Overall significance for all dementia diagnosis

Since there is no censored data, we applied the survreg function with a “lognormal” distribution to fit a parametric survival regression model.

The dataset exhibits multicollinearity, which means some variables are highly correlated. For example, “ACE-III Total” is the total score on the Addenbrooke’s Cognitive Examination-III test (the sum of all other ACE-III columns). Multicollinearity is likely to result in unreliable coefficient estimation and difficult interpretation. Thus, we applied the Variance Inflation Factor (VIF) to detect multicollinearity. VIF values show the strength of the correlation between the independent variables. To avoid problems caused by multicollinearity, we only considered variables with lower VIF values. The variables with lower VIF values can display characteristics of omitted variables, so we did not lose information from the dataset.

We scaled the data to ensure data in the same scale, so the variables with a high magnitude of coefficient have a more significant influence on our dependent variable “Diagnosis till death”. For the full model considering all variables, we presented a bar plot to visualise the top 4 significant variables:
- Disease duration at diagnosis: the number of years the person has been experiencing symptoms before coming into the clinic.
- ACEIII::SubtotalFluency: An examination of the integrity of verbal fluency by a cognitive screening tool.
- ACEIII::SubtotalVisuospatial: An examination of the integrity of visuospatial by a cognitive screening tool.
- CBI::PercentSelfCareFCorrected: The percentages of the maximum achievable score in self-care examines by The Cambridge Behavioural Inventory-Revised (CBI-R). Higher scores denote better performance in self care.

The variables shown in orange play a positive impact on the model, and the variables shown in blue play a negative impact on the model.

More details for the full model are listed in the Appendix.

Significance of variabes by dementia group


                       AD                     bvFTD                       CBS 
                       49                        61                        25 
                  FTD-MND                     lvPPA                    nfvPPA 
                       35                        27                        29 
nfvPPA + Parkinson’s plus                       PSP                     svPPA 
                       13                        21                        21 

These are nine different dementia groups. We counted the number of observations for each group. Since there are no sufficient observations in the “nfvPPA + Parkinson’s plus” group, “PSP” group, and “svPPA” group, there is an overfitting. The insufficient data probably leads to a model that is overly tailored to samples by capturing noise rather than the underlying pattern.

Besides, from previous analysis, we found “Diagnosis_number” plays an important role, so we constructed models for dementia groups with enough data. From above plots, we visualised significant variables for each dementia group.


- AD: “ACEIII::SubtotalVisuospatial” has a positive impact, and “EducationYearsTotal” and “Age at diagnosis” have a negative impact on “Diagnosis till death”.


- bvFTD: “CBI::PercentMemory”, “ACEIII::SubtotalVisuospatial”, and “ACEIII::SubtotalFluency” have a positive impact, and “CBI::PercentSelfCare” has a negative impact on “Diagnosis till death”.


- CBS: There are many variables having impact on “Diagnosis till death”. “ACEIII::SubtotalAttention” has a positive impact, and “CBI::PercentEating” and “CBI::PercentAbnormal” have a negative impact.


- FTD-MND: There are many variables having impact on “Diagnosis till death”. “Sex (Male 1, Female 2)” and “Disease duration at diagnosis” have a positive impact, and “CBI::PercentAbnormal” has a negative impact.


- lvPPA: There are many variables having impact on “Diagnosis till death”. “CBI::PercentSelfCare” and “ACEIII::SubtotalLanguage” have a positive impact, and “CBI::PercentAbnormal” has a negative impact.


- nfvPPA: “Sex (Male 1, Female 2)”, “CBI::PercentEating”, “ACEIII::SubtotalFluency” have a postive impact, and “ACEIII::SubtotalAttention” has a negative impact on “Diagnosis till death”.

It is same as the previous bar plot. The variables shown in orange and blue play a positive and negative impact on the model respectively. We can conclude “ACEIII::SubtotalVisuospatial”, “ACEIII::SubtotalFluency”, “CBI::PercentSelfCare”, “ACEIII::SubtotalAttention”, “CBI::PercentEating”, “CBI::PercentAbnormal” are significant for most dementia groups.

More details for the each dementia group model are listed in the Appendix.

Appendix

Testing Proportional Harzard

Covariate Chisq Degrees of Freedom p-value
Age at diagnosis 1.24 1 0.266
Sex 1.35 1 0.246
Disease duration at diagnosis 0.17 1 0.682
ACE-III Total 1.09 1 0.297
GLOBAL 3.25 4 0.517
Covariate Chisq Degrees of Freedom p-value
Age at diagnosis 0.42 1 0.517
Sex 6.61 1 0.010
Disease duration at diagnosis 0.41 1 0.523
ACE-III Total 1.02 1 0.312
GLOBAL 9.12 4 0.058
Covariate Chisq Degrees of Freedom p-value
Age at diagnosis 0.05 1 0.826
Sex 1.88 1 0.170
Disease duration at diagnosis 0.01 1 0.933
ACE-III Total 1.64 1 0.200
GLOBAL 3.87 4 0.424
Covariate Chisq Degrees of Freedom p-value
Age at diagnosis 0.36 1 0.548
Sex 0.05 1 0.824
Disease duration at diagnosis 0.84 1 0.358
ACE-III Total 4.50 1 0.034
GLOBAL 6.95 4 0.139
Covariate Chisq Degrees of Freedom p-value
Age at diagnosis 0.00 1 0.992
Sex 0.12 1 0.731
Disease duration at diagnosis 2.92 1 0.088
ACE-III Total 0.01 1 0.905
GLOBAL 3.86 4 0.425
Covariate Chisq Degrees of Freedom p-value
Age at diagnosis 0.08 1 0.782
Sex 1.50 1 0.220
Disease duration at diagnosis 4.85 1 0.028
ACE-III Total 1.57 1 0.211
GLOBAL 10.68 4 0.030
Covariate Chisq Degrees of Freedom p-value
Age at diagnosis 0.12 1 0.734
Sex 0.05 1 0.832
Disease duration at diagnosis 1.25 1 0.263
ACE-III Total 0.78 1 0.377
GLOBAL 2.34 4 0.674
Covariate Chisq Degrees of Freedom p-value
Age at diagnosis 0.15 1 0.702
Sex 0.07 1 0.796
Disease duration at diagnosis 0.14 1 0.705
ACE-III Total 2.46 1 0.117
GLOBAL 2.87 4 0.579
Covariate Chisq Degrees of Freedom p-value
Age at diagnosis 0.12 1 0.734
Sex 0.00 1 0.972
Disease duration at diagnosis 0.39 1 0.530
ACE-III Total 0.76 1 0.384
GLOBAL 3.47 4 0.483

Results from all dementia diagnosis

Coefficient Std. Error z value p value
(Intercept) 1.6287042 0.0991719 16.4230431 0.0000000
Diagnosis_nameFTD-MND -1.3149343 0.1465575 -8.9721390 0.0000000
Log(scale) -0.6053870 0.0421825 -14.3516303 0.0000000
Diagnosis_namenfvPPA + Parkinson’s plus -0.2807378 0.1883152 -1.4907868 0.1360175
Diagnosis_namelvPPA -0.2083461 0.1400319 -1.4878470 0.1367912
Diagnosis_namenfvPPA -0.1942113 0.1576234 -1.2321225 0.2179033
ACEIII::SubtotalFluency 0.1758444 0.0477791 3.6803610 0.0002329
ACEIII::SubtotalVisuospatial 0.1300042 0.0559673 2.3228600 0.0201867
CBI::PercentSelfCareFCorrected -0.1291601 0.0496838 -2.5996408 0.0093321
Sex (Male 1, Female 2) 2 0.1117217 0.0726728 1.5373244 0.1242139
CBI::PercentMemoryFCorrected 0.1082240 0.0557348 1.9417675 0.0521652
ACEIII::SubtotalMemory -0.0980801 0.0644657 -1.5214318 0.1281515
Diagnosis_namebvFTD -0.0789263 0.1392427 -0.5668258 0.5708326
Disease duration at diagnosis 0.0776235 0.0361458 2.1475100 0.0317527
CBI::PercentMotivationFCorrected -0.0761239 0.0533105 -1.4279338 0.1533109
Age at diagnosis -0.0621485 0.0376262 -1.6517348 0.0985886
CBI::PercentAbnormalFCorrected -0.0554625 0.0575207 -0.9642179 0.3349367
Diagnosis_namesvPPA 0.0506025 0.1866061 0.2711727 0.7862582
Diagnosis_namePSP -0.0503900 0.1746879 -0.2884574 0.7729967
ACEIII::SubtotalAttention 0.0499284 0.0620538 0.8045995 0.4210508
ClinicalAssessment::GoldmanScore 0.0471112 0.0355100 1.3267019 0.1846073
CBI::PercentSleepFCorrected -0.0468760 0.0396049 -1.1835893 0.2365757
Diagnosis_nameCBS -0.0467270 0.1698713 -0.2750732 0.7832600
CBI::PercentEverydayFCorrected -0.0387882 0.0580971 -0.6676443 0.5043607
CBI::PercentMoodFCorrected 0.0335674 0.0504296 0.6656296 0.5056479
CBI::PercentBeliefsFCorrected 0.0168583 0.0374950 0.4496140 0.6529888
EducationYearsTotal 0.0100432 0.0371356 0.2704470 0.7868164
CBI::PercentStereotypicalFCorrected 0.0074927 0.0546087 0.1372075 0.8908668
ACEIII::SubtotalLanguage -0.0072250 0.0622213 -0.1161174 0.9075595
CBI::PercentEatingFCorrected -0.0030126 0.0564909 -0.0533295 0.9574693

Results from each dementia diagnosis

Coefficient Std. Error z value p value
(Intercept) 1.9750823 0.1304882 15.1361021 0.0000000
Log(scale) -1.0695636 0.1010153 -10.5881398 0.0000000
ACEIII::SubtotalVisuospatial 0.3234895 0.0731973 4.4194162 0.0000099
CBI::PercentEating 0.2350261 0.1710496 1.3740229 0.1694346
Age at diagnosis -0.2089849 0.0555404 -3.7627564 0.0001681
CBI::PercentSelfCare -0.1704212 0.1295625 -1.3153584 0.1883895
EducationYearsTotal -0.1431628 0.0674650 -2.1220319 0.0338351
Sex (Male 1, Female 2) 2 0.1286976 0.1178721 1.0918412 0.2749029
ACEIII::SubtotalLanguage -0.1268934 0.1546210 -0.8206738 0.4118321
Disease duration at diagnosis -0.1220367 0.0729627 -1.6725902 0.0944080
ACEIII::SubtotalAttention 0.1068610 0.1215558 0.8791107 0.3793413
CBI::PercentAbnormal -0.0803797 0.1685262 -0.4769568 0.6333929
ACEIII::SubtotalFluency -0.0736457 0.0967825 -0.7609406 0.4466925
GoldmanScore 0.0730915 0.1044951 0.6994732 0.4842564
CBI::PercentMotivation -0.0712989 0.1096646 -0.6501540 0.5155927
CBI::PercentSleep -0.0653759 0.0751421 -0.8700306 0.3842837
CBI::PercentStereotypical -0.0629505 0.1336806 -0.4709021 0.6377107
ACEIII::SubtotalMemory 0.0517398 0.1237023 0.4182605 0.6757567
CBI::PercentMemory -0.0411183 0.1035539 -0.3970718 0.6913145
CBI::PercentMood 0.0392490 0.0953593 0.4115904 0.6806397
CBI::PercentBeliefs -0.0282680 0.0972206 -0.2907617 0.7712336
CBI::PercentEveryday 0.0050338 0.1057047 0.0476213 0.9620180
Coefficient Std. Error z value p value
(Intercept) 1.5303344 0.1170029 13.0794614 0.0000000
Log(scale) -0.8642163 0.0905357 -9.5455812 0.0000000
ACEIII::SubtotalVisuospatial 0.3778851 0.1298433 2.9103170 0.0036106
Sex (Male 1, Female 2) 2 0.2886560 0.1653018 1.7462366 0.0807698
ACEIII::SubtotalFluency 0.2565609 0.0770503 3.3297847 0.0008691
CBI::PercentMemory 0.2381783 0.0976174 2.4399175 0.0146906
CBI::PercentMotivation -0.1902755 0.1033317 -1.8414041 0.0655624
CBI::PercentSelfCare -0.1681407 0.0732513 -2.2953961 0.0217104
CBI::PercentEveryday 0.1342125 0.1096458 1.2240559 0.2209312
CBI::PercentMood 0.1280998 0.0833549 1.5367993 0.1243425
CBI::PercentAbnormal -0.1038375 0.0903971 -1.1486811 0.2506875
Age at diagnosis -0.0719060 0.0708841 -1.0144167 0.3103840
Disease duration at diagnosis 0.0638477 0.0551819 1.1570405 0.2472558
ACEIII::SubtotalMemory 0.0593544 0.1088862 0.5451044 0.5856817
ACEIII::SubtotalLanguage -0.0566174 0.1236980 -0.4577070 0.6471629
CBI::PercentStereotypical -0.0359267 0.0909457 -0.3950342 0.6928176
CBI::PercentBeliefs -0.0344206 0.0472027 -0.7292072 0.4658749
EducationYearsTotal -0.0305323 0.0749984 -0.4071060 0.6839301
GoldmanScore 0.0272971 0.0433647 0.6294791 0.5290355
CBI::PercentSleep -0.0235812 0.0687799 -0.3428496 0.7317116
ACEIII::SubtotalAttention -0.0116516 0.1224109 -0.0951844 0.9241683
CBI::PercentEating -0.0043376 0.0854637 -0.0507537 0.9595217
Coefficient Std. Error z value p value
Log(scale) -2.5074648 0.1414214 -17.7304536 0.0000000
(Intercept) 1.2601504 0.0813703 15.4866045 0.0000000
ACEIII::SubtotalAttention 0.6429914 0.0924133 6.9577794 0.0000000
CBI::PercentAbnormal -0.6081172 0.1485806 -4.0928443 0.0000426
CBI::PercentEating -0.5693867 0.2192432 -2.5970548 0.0094027
Age at diagnosis -0.3749717 0.0397285 -9.4383467 0.0000000
ACEIII::SubtotalFluency -0.3596098 0.0783762 -4.5882543 0.0000045
ACEIII::SubtotalLanguage 0.2561218 0.0901964 2.8396030 0.0045170
Disease duration at diagnosis 0.2549515 0.0475947 5.3567202 0.0000001
ACEIII::SubtotalMemory -0.2291749 0.1057606 -2.1669206 0.0302409
ACEIII::SubtotalVisuospatial -0.1924689 0.0468063 -4.1120264 0.0000392
CBI::PercentMotivation 0.1909696 0.0645523 2.9583709 0.0030927
CBI::PercentMemory 0.1740816 0.0795944 2.1871083 0.0287346
CBI::PercentStereotypical 0.1740530 0.1266023 1.3748014 0.1691930
CBI::PercentBeliefs 0.1633968 0.0623610 2.6201742 0.0087885
CBI::PercentSleep 0.1619915 0.0562377 2.8804782 0.0039707
GoldmanScore 0.1399078 0.0498079 2.8089447 0.0049704
CBI::PercentEveryday -0.1254057 0.0928637 -1.3504274 0.1768789
Sex (Male 1, Female 2) 2 -0.0498895 0.0914201 -0.5457171 0.5852604
CBI::PercentSelfCare 0.0191061 0.0787067 0.2427508 0.8081984
EducationYearsTotal 0.0189151 0.0368374 0.5134742 0.6076196
CBI::PercentMood 0.0076438 0.0579347 0.1319384 0.8950330
Coefficient Std. Error z value p value
Disease duration at diagnosis 1.2358327 0.1325214 9.3255344 0.0000000
Log(scale) -1.1623290 0.1195229 -9.7247417 0.0000000
Sex (Male 1, Female 2) 2 0.9278389 0.2189638 4.2374072 0.0000226
CBI::PercentAbnormal -0.7573791 0.1153260 -6.5672913 0.0000000
(Intercept) 0.7440880 0.1048109 7.0993354 0.0000000
ACEIII::SubtotalFluency 0.5266696 0.0878011 5.9984373 0.0000000
CBI::PercentSelfCare -0.4768137 0.1604920 -2.9709504 0.0029688
CBI::PercentEveryday 0.4269279 0.1913710 2.2308912 0.0256883
CBI::PercentSleep 0.3911099 0.0964967 4.0530894 0.0000505
CBI::PercentMotivation 0.3800288 0.1054487 3.6039195 0.0003135
CBI::PercentStereotypical 0.3773127 0.1193792 3.1606229 0.0015743
ACEIII::SubtotalVisuospatial -0.3649682 0.1723536 -2.1175557 0.0342127
CBI::PercentEating -0.2980087 0.1297252 -2.2972311 0.0216056
GoldmanScore -0.2946542 0.0716841 -4.1104531 0.0000395
ACEIII::SubtotalLanguage 0.2907866 0.1230162 2.3638076 0.0180882
ACEIII::SubtotalMemory 0.2434248 0.1375060 1.7702854 0.0766796
Age at diagnosis -0.2315923 0.1109884 -2.0866352 0.0369211
CBI::PercentMood 0.2129421 0.1105221 1.9266921 0.0540180
CBI::PercentBeliefs -0.1845847 0.1034678 -1.7839817 0.0744266
EducationYearsTotal 0.1306507 0.0952751 1.3712994 0.1702816
CBI::PercentMemory -0.0740127 0.1231594 -0.6009502 0.5478731
ACEIII::SubtotalAttention -0.0224537 0.1301940 -0.1724635 0.8630732
Coefficient Std. Error z value p value
(Intercept) 1.8026096 0.1209581 14.9027656 0.0000000
Log(scale) -1.6674571 0.1360828 -12.2532574 0.0000000
CBI::PercentSelfCare 1.4020805 0.2238965 6.2621819 0.0000000
CBI::PercentAbnormal -0.6431153 0.1610063 -3.9943497 0.0000649
ACEIII::SubtotalLanguage 0.6034872 0.0958470 6.2963588 0.0000000
Age at diagnosis -0.5169174 0.0752091 -6.8730720 0.0000000
ACEIII::SubtotalMemory -0.3792927 0.1358856 -2.7912649 0.0052502
Sex (Male 1, Female 2) 2 0.3643523 0.1147199 3.1760158 0.0014931
Disease duration at diagnosis 0.3573427 0.1005937 3.5523380 0.0003818
CBI::PercentStereotypical 0.3484087 0.1180543 2.9512583 0.0031648
CBI::PercentMood -0.3461832 0.1049336 -3.2990700 0.0009701
CBI::PercentEveryday -0.3000232 0.0746338 -4.0199351 0.0000582
ACEIII::SubtotalFluency 0.2770906 0.0874911 3.1670741 0.0015398
GoldmanScore -0.2643619 0.0837359 -3.1570927 0.0015935
CBI::PercentMemory -0.2117017 0.1189139 -1.7802942 0.0750278
CBI::PercentBeliefs -0.1757078 0.1523796 -1.1530929 0.2488722
CBI::PercentSleep -0.1253179 0.0928932 -1.3490534 0.1773198
ACEIII::SubtotalAttention 0.0785194 0.1326390 0.5919781 0.5538653
CBI::PercentMotivation 0.0647494 0.1056918 0.6126246 0.5401246
EducationYearsTotal 0.0539759 0.0587912 0.9180949 0.3585692
ACEIII::SubtotalVisuospatial -0.0466682 0.1493251 -0.3125279 0.7546394
CBI::PercentEating 0.0387899 0.1308310 0.2964889 0.7668568
Coefficient Std. Error z value p value
(Intercept) 1.3547831 0.3874500 3.4966660 0.0004711
Log(scale) -1.0773595 0.1313064 -8.2049254 0.0000000
CBI::PercentEating 1.0274441 0.4081958 2.5170376 0.0118346
ACEIII::SubtotalAttention -0.7660366 0.3165456 -2.4199878 0.0155210
CBI::PercentSelfCare -0.5565359 0.6558621 -0.8485563 0.3961282
ACEIII::SubtotalFluency 0.5051080 0.1546901 3.2652894 0.0010935
CBI::PercentAbnormal -0.4995772 0.2653886 -1.8824364 0.0597768
Sex (Male 1, Female 2) 2 0.4792632 0.1926135 2.4882116 0.0128387
Disease duration at diagnosis -0.2879376 0.1480755 -1.9445316 0.0518314
ACEIII::SubtotalVisuospatial 0.2876360 0.2855660 1.0072491 0.3138151
CBI::PercentMood 0.2815494 0.1576189 1.7862668 0.0740561
CBI::PercentStereotypical -0.2402609 0.2051067 -1.1713945 0.2414402
CBI::PercentMotivation -0.2310093 0.1794759 -1.2871324 0.1980481
ACEIII::SubtotalMemory 0.1961746 0.2373103 0.8266587 0.4084305
CBI::PercentBeliefs 0.1676204 0.9042686 0.1853658 0.8529421
ACEIII::SubtotalLanguage -0.1363656 0.2832484 -0.4814346 0.6302077
GoldmanScore 0.1231585 0.2982228 0.4129749 0.6796250
CBI::PercentSleep 0.0974343 0.1751731 0.5562172 0.5780624
Age at diagnosis -0.0701088 0.1005063 -0.6975563 0.4854547
CBI::PercentMemory 0.0644035 0.1824586 0.3529758 0.7241066
EducationYearsTotal -0.0228537 0.1797444 -0.1271453 0.8988254
CBI::PercentEveryday -0.0154270 0.2513615 -0.0613739 0.9510614

Contribution

Linh Trinh - Analysis and Report writing. Betty Zhao - Analysis and Report writing.